Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate
نویسندگان
چکیده
The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, the prediction of BLAST is based on protein sequence similarity, while that of machine learning is also based on the sequence, but without the consideration of their similarity. This unique characteristic of machine learning makes it a good complement to BLAST and many other approaches in predicting the function of remotely relevant proteins and the homologous proteins of distinct function. However, the identification accuracies of these in silico methods and their false discovery rate have not yet been assessed so far, which greatly limits the usage of these algorithms. Herein, a comprehensive comparison of the performances among four popular prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these methods was systematically assessed by four standard statistical indexes based on the independent test datasets of 93 functional protein families defined by UniProtKB keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model organisms (Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Mycobacterium tuberculosis). As a result, the substantially higher sensitivity of SVM and BLAST was observed compared with that of PNN and KNN. However, the machine learning algorithms (PNN, KNN and SVM) were found capable of substantially reducing the false discovery rate (SVM < PNN < KNN). In sum, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.
منابع مشابه
Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks
Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...
متن کاملThe False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data
Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...
متن کاملTUNNEL BORING MACHINE PENETRATION RATE PREDICTION BASED ON RELEVANCE VECTOR REGRESSION
key factor in the successful application of a tunnel boring machine (TBM) in tunneling is the ability to develop accurate penetration rate estimates for determining project schedule and costs. Thus establishing a relationship between rock properties and TBM penetration rate can be very helpful in estimation of this vital parameter. However, this parameter cannot be simply predicted since there ...
متن کاملImproving biological activity prediction of protein kinase inhibitors using artificial neural network and partial least square methods
Introduction: Protein kinase causes many diseases, including cancer; therefore, inhibiting them plays an important role in the treatment of many diseases. Traditional discovery inhibitors of this enzyme is a time-consuming and costly process. Finding a reliable computer-aided drug discovery tools which can detect the inhibitors will reduce the cost. In this study, it is attempted to separate ki...
متن کاملImproving biological activity prediction of protein kinase inhibitors using artificial neural network and partial least square methods
Introduction: Protein kinase causes many diseases, including cancer; therefore, inhibiting them plays an important role in the treatment of many diseases. Traditional discovery inhibitors of this enzyme is a time-consuming and costly process. Finding a reliable computer-aided drug discovery tools which can detect the inhibitors will reduce the cost. In this study, it is attempted to separate ki...
متن کامل